Prediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine
نویسندگان
چکیده
How to predict subcellular multi-locations of proteins with machine learning techniques is a challenging problem in computational biology community. Regarding the protein multi-location problem as a multi-label pattern classification problem, we propose a new predicting method for dealing with the protein subcellular localization problem in this paper. Two key points of the proposed method are to divide a seriously unbalanced multi-location problem into a number of more balanced two-class subproblems by using the part-versus-part task decomposition approach, and learn all of the subproblems by using the min-max modular support vector machine (M-SVM). To evaluate the effectiveness of the proposed method, we perform experiments on yeast protein data set by using two kinds of task decomposition strategies and three kinds of feature extraction methods. The experimental results demonstrate that our method achieves the highest prediction accuracy, which is much better than that obtained by the existing approach based on the traditional support vector machine.
منابع مشابه
Prediction of Protein Subcellular Multi-localization by Using a Min-Max Modular Support Vector Machine
Prediction of protein subcellular location is an important issue in computational biology because it provides important clues for characterization of protein function. Currently, much effort has been dedicated to developing automatic prediction tools. However, most of them focus on mono-locational proteins. It should be noted that many proteins bear multi-locational characteristics, and they ca...
متن کاملMulti-View Face Recognition with Min-Max Modular Support Vector Machines
As a result of statistical learning theory, support vector machines (SVMs)[23] are effective classifiers for the classification problems. SVMs have been successfully applied to various pattern classification problems, such as handwritten digit recognition, text categorization and face detection, due to their powerful learning ability and good generalization ability. However, SVMs require to sol...
متن کاملPrediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs
MOTIVATION The subcellular location of a protein is closely correlated to its function. Thus, computational prediction of subcellular locations from the amino acid sequence information would help annotation and functional prediction of protein coding genes in complete genomes. We have developed a method based on support vector machines (SVMs). RESULTS We considered 12 subcellular locations in...
متن کاملSupport vector machine approach for protein subcellular localization prediction
MOTIVATION Subcellular localization is a key functional characteristic of proteins. A fully automatic and reliable prediction system for protein subcellular localization is needed, especially for the analysis of large-scale genome sequences. RESULTS In this paper, Support Vector Machine has been introduced to predict the subcellular localization of proteins from their amino acid compositions....
متن کاملLearning from imbalanced data sets with a Min-Max modular support vector machine
Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of sim...
متن کامل